Using Rogets Thesaurus to Determine the Similarity of Texts
نویسنده
چکیده
منابع مشابه
ارائه روشی برای استخراج کلمات کلیدی و وزندهی کلمات برای بهبود طبقهبندی متون فارسی
Due to ever-increasing information expansion and existing huge amount of unstructured documents, usage of keywords plays a very important role in information retrieval. Because of a manually-extraction of keywords faces various challenges, their automated extraction seems inevitable. In this research, it has been tried to use a thesaurus, (a structured word-net) to automatically extract them. A...
متن کاملAnalysis and Construction of Noun Hypernym Hierarchies to Enhance Roget’s Thesaurus
Lexical resources are machine-readable dictionaries or lists of words, where semantic relationships between the terms are somehow expressed. These lexical resources have been used for many tasks such as word sense disambiguation and determining semantic similarity between terms. In recent years some research has been put into automatically building lexical resources from large corpora. In this ...
متن کاملDetermining Word Sense Dominance Using a Thesaurus
The degree of dominance of a sense of a word is the proportion of occurrences of that sense in text. We propose four new methods to accurately determine word sense dominance using raw text and a published thesaurus. Unlike the McCarthy et al. (2004) system, these methods can be used on relatively small target texts, without the need for a similarly-sensedistributed auxiliary text. We perform an...
متن کاملTowards a Universal Method for Measuring Semantic Textual Similarity
Semantic textual similarity (STS) measures the degree to which two texts share the same meaning. In Natural Language Processing, STS touches many different aspects, from thesaurus generation to machine translation. However, methods for measuring STS have often been developed only for very specific types of texts, such as for comparing two words or for comparing paragraphs. This disjoint approac...
متن کاملCross-Lingual Document Similarity Calculation Using the Multilingual Thesaurus EUROVOC
We are presenting an approach to calculating the semantic similarity of documents written in the same or in different languages. The similarity calculation is achieved by representing the document contents in a language-independent way, using the descriptor terms of the multilingual thesaurus EUROVOC, and by then calculating the distance between these representations. While EUROVOC is a careful...
متن کامل